Word Sense Disambiguation as a Wordnets' Validation Method in Balkanet

نویسندگان

  • Dan Tufis
  • Radu Ion
  • Nancy Ide
چکیده

BalkaNet is a European project which aims at the development of monolingual wordnets for five languages in the Balkans area (Bulgarian, Greek, Romanian Serbia, and Turkish) and at improvement of the Czech wordnet developed in the EuroWordNet project. The wordnets are aligned to the Princeton Wordnet, according to the principles established by the EuroWordNet consortium. One of the main concerns of this project is the interlingual validation of the wordnets alignment. To this end, we have developed a WSD system based on parallel corpora which exploits the common intuition according to which words that are reciprocal translations in a parallel texts should have the same (or closely related) interlingual meanings. With wordnets under construction our WSD system is mainly a validation tool, pinpointing wrong interlingual alignments, incomplete or missing synsets in one or another of the wordnets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets

The paper presents a method for word sense disambiguation based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in the corpus. The wordnets are aligned to the Princeton Wordnet, according to the principles established by Euro...

متن کامل

News about the Romanian Wordnet

There are more than 60 wordnets worldwide; the Romanian wordnet is among those that are maintained and further developed. Begun within the BalkaNet project and further enriched in various (application oriented) projects, it was used in word sense disambiguation, machine translation and question answering with promising results. We present here the latest qualitative and quantitative improvement...

متن کامل

From Word Alignment to Word Senses, via Multilingual Wordnets

Most of the successful commercial applications in language processing (text and/or speech) dispense of any explicit concern on semantics, with the usual motivations stemming from the computational high costs required, in case of large volumes of data, for dealing with semantics. With recent advances in corpus linguistics and statistical-based methods in NLP, revealing useful semantic features o...

متن کامل

Word Sense Disambiguation: A Case Study on the Granularity of Sense Distinctions

The paper presents a method for word sense disambiguation (WSD) based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and is supported by a lexical ontology made of aligned wordnets for the languages in the corpora. The wordnets are aligned to the Princeton Wordnet, according to the principle...

متن کامل

Multilingual Word Sense Disambiguation Using Aligned Wordnets

Word Sense Disambiguation (WSD from now on) represents an established task within Natural Language Processing community, aiming at finding the right sense of a word occurring in a free running text through the use of a computer algorithm. Currently, most of the WSD approaches consider only monolingual texts, and, as such, they rely mainly on the discriminatory power of the words appearing in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004